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ABSTRACT 



An automatic, language-independent syntax error detection, recovery, 
and correction system for LR(k) grammars is proposed. The requirement 
is made that the reverse of the grammar involved is also LR(k). The 
implications and justification for this requirement are discussed. 

Given that the grammar is both LR(k) and RL(k), forward and reverse 
parsers localize errors and define left and right error context pro- 
viding a strong base from which error analysis may proceed. Possible 
deterministic and heuristic corrective actions to follow error analysis 
are presented. The definition and selection of keys, from the set of 
terminal symbols for the grammar which enable the reverse parser to be 
engaged upon error detection are discussed. 

A model of the proposed system, implemented in an XPL compiler for 
a large ALGOL-like grammar, is described and the results of test 
programs are exampled and discussed. 

Possible extensions to the system are presented and areas requiring 
further analysis are defined. 
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I. INTRODUCTION 



Most compilers and compiler writing systems have some kind of error 
detection and recovery mechanisms built in. Most provide a degree of 
error analysis and indicate to the user an error type and an approximate 
position of the error in the input stream. Diagnostic messages range 
from a reference number to full statements of suspected cause followed 
by. parse histories. The suspected error symbol may be flagged with a 
pointer or referenced by name or both. Some error analysis systems are 
even sophisticated enough to specify the error symbol exactly and state 
the correction necessary. 

If'an error can be located precisely and defined without ambiguity 
then it seems logical that an immediate correction should be made and 
the processing allowed to continue. In general, it would seem that the 
more exactly an error could be defined the more efficiently the user's 
and computer's resources would be utilized. 

Research indicates that despite appreciable effort, attempts to 
design comprehensive error processing systems to accompany the 
increasingly popular mechanical compilers, translator writer systems 
(TWS.),.and compiler-compilers has not been very successful. The error 
processing systems that do exist range from extremely simple recovery- 
only schemes to fairly complex attempts at error correction. 

It. is- proposed that an efficient automatic syntax error processing 
system for LR(k) grammars can be defined. . The system will operate as 
a* function of a grammar only, its parameters being defined by the 
grammar analyzer and the grammar parsing function. 
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The objectives of such a system would be (1) to detect as many 
syntax errors as possible. Recovery systems that simply delete code 
to some predefined symbol do not afford the programmer maximum exposure 
of' his code to the analytical processes, (2) to detect errors as early 
as possible to enable a more tenable recovery/correction scheme. 

Perhaps, one of the most unsettling errors are those diagnosted as "NO 
PRODUCTION APPLICABLE." This type of error is generally associated 
with the precedence parsers and is the case of symbols being pushed 
onto the stack after having been interpretted contextually correct 
locally. The error is discovered when a subsequent symbol requires a 
reduction of the symbol stack and the error symbol does not fit any 
production definiton, (3) to make as many viable corrections as possible 
sa as to allow continuous scan for maximum error detection; only as a 
last resort delete code to affect recovery, (4) to avoid generating new 
syntax errors by either correcting the error or affecting a complete 
recovery. The inefficiency in correcting an error (or v/orse, recovering 
from one) only to alter the code so as to create another syntax error 
is:- evident,. (5) to avoid passing errors into the parse stack. This 
condition gives rise to the difficulties of having to "undo" emitted 
code, and (6) to define errors as exactly and completely as possible 
if only to provide more meaningful diagnostics should the error 
car recti on attempt fail. 

The. error correcting system will be defined to operate in an XPL 
compiler for LR(k) grammars whose reverse is also LR(k) and will be 
capable of correcting detectable error sequences of n symbols where n 
would be fixed when the compiler was constructed. For grammars meeting 
this restriction, forward and reverse LR(k) parsers can be defined and 
will be employed to localize errors and define error context. 
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The left context of an error is defined by normal LR(k) parsing of 
the input. stream. The right context is defineable by employing the 
finite state machine representation of an LR(k) parser. Key symbols 
that uniquely define states in the FSM are selected from the set of 
terminal symbols for the grammar. When an error is detected, the next 
n' symbols are. ignored and the input code following the error sequence 
is: scanned for a key symbol. When a key is located, the reverse parser 
is: engaged to parse back to the error sequence. The right context 
thus, defined, coupled with the left context provided by the forward 
parser forms a base from which error analysis may commence. Error 
corrections are defined by generating symbol strings of length n and 
comparing them with the error sequence. 

The' effectiveness of the system will be demonstrated by implement- 
ing, the procedure for a non-trivial ALGOL-like language. The system 
was' restricted from accessing the LR(k) parse stack. Though broad 
classes, of errors are correctable, this restriction defined a small 
set. of errors that is not easily corrected. For the event that the 
error could not be corrected deterministically, the error analyzer was 
defined to always heuristically select a symbol for insertion or 
replacement as an attempted correction. In this situation, the analyzer 
would continue to manipulate the symbol sequence between the forward 
parser and the key in an attempt to achieve correct syntax but there 
were cases where the resulting correction became unrealistic. Hence, 
it was necessary to place a restriction on the number of heuristic 
attempts that would be made to correct the error. The process was 
aborted if a complete correction was not affected in this many attempts, 
code was delected through the key, and forward parsing was restarted 
at the symbol following the key. 



8 



II. CURRENT SYSTEMS 



As early as 1963 the need for automatic error analysis and correc- 
tion systems to be part of syntax directed compilers was recognized. 
Efforts toward the accomplishment of this goal resulted in the design 
of systems with capabilities ranging from simple recovery to fairly 
complex recovery and correction. A sample of the spectrum may be found 
in considering briefly the works of Irons, McKeeman, Leinius, LaFrance, 
and Rich. 

A. IRONS 

Irons [5] designed a parse algorithm which was guaranteed to manipu- 
late an input stream until it was syntactically correct for some defined 
grammar. Briefly, the mechanism involved carrying out all possible 
parses simultaneously. An error condition was defined when none of 
the current parses could continue. Error recovery and correction 
involved discarding the input stream from the error symbol until a 
symbol was found that would be syntactically correct for one of the 
existing parses. A string of symbols (including the null string) that 
would permit the selected parse to continue was then generated and 
inserted at the error point. Irons claimed the algorithm to be 
"relatively" efficient in terms of space and time requirements. 

However, it is conjectured that the algorithm would not be competitive 
in terms of space and time requirements if it was used on a larger 
grammar for a user-oriented language. 

The algorithm accomplishes error correction but at a rather 
primitive level as it operates on the very simple mechanism of deleting 
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code rather than making any attempt to analyze the error relative to 
its total environment. No attempt is made to ascertain the extent of 
the error or its total local context. For example, if a missing punctua- 
tion symbol following a statement constituted the error, then it is 
highly probable that a correct following statement would be deleted in 
the search for the punctuation mark. Automatic v/holesale code deletion 
such as this . is a fairly severe price to pay for error correction, 
particularly when program logic may be destroyed. 

B. McKEEMAN 

In Reference 10, McKeeman examples the simple extreme. When an 
error condition occurs, the input stream is scanned for an obvious 
"stop" symbol for the language; the semicolon was used in the reference. 
The interim code, including the error condition, is deleted and parsing 
is re-initialized at the stop symbol. 

The advantages to such a system are obvious--it is easily and 
efficiently implemented, it is fast, and it does not create any new 
syntax errors. However, as there is no attempt to correct an error, 
there is no possibility of executing. Additionally, the programmer 
also loses the opportunity to have all of his code scanned for syntactic 
continuity. 

Example: IF. . . e -j . . .THEN IF. . . e£ • • .THEN. . . ; 

Error e£ will not be found in the process of deleting code between 
error e-| and the semicolon. 

C. . LEINIUS 

Working with the LR(k) grammars, Leinius' parser constructor 
defines a set of right context symbols to be used for error recovery 
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for each partial parse existing in each state of the parser [9]. Locat- 
ing, a. member of the set in the input stream allows the completion of a 
partial parse and the resultant reduction to be made. When an error 
symbol is read, a choice of recovery procedures is offered. The symbol 
string may. be immediately scanned for one of the currently applicable 
right context symbols or the stack may be searched to determine if the 
symbol just read is a right context symbol for some partial parse 
existing deeper in the stack. If the stack search fails then a decision 
must* be made as to the state in which scanning should commence to locate 
a: right context symbol. This system is a more refined attempt at error 
recovery as the right context symbol offers a more local choice than 
simply scanning ahead to a stop symbol. But the system closely 
parallels' Irons' in that it is also possible that wholesale deletions 
can take place while scanning for a required symbol. More important, 
however, more syntax errors can be generated. 

Example: (X + e-| (X + X)) 

The second left parenthesis will be deleted while scanning to the 
right looking for the required "X" with which to replace the error e-j 
and that deletion will obviously create an additional error when the 
parser attempts to read the second right parenthesis at the end of the 
string. 

D.. La FRANCE 

LaFrance's error correction system employs groups of Floyd pro- 
ductions redefining a BNF language with necessary error productions 
build. into the groups [8]. The error correction mechanism is based 
essentially on pattern matching. For errors involving unique productions, 
that is productions that require no context check, the symbol at the top 
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of the stack and/or the next input symbol, are manipulated in accordance 
with an ordered set of transposition, insertion, and deletion rules. 
Otherwise, the applicable productions are expanded to three symbols 
ahead. These triples are then compared against the next four symbols 
from the input stream to find a match in a set of twenty patterns which 
defines a correcting modification to the input stream. If no match is 
found, the input stream is scanned until a symbol is located which will 
permit completion of a partial parse and control is then passed to the 
appropriate group and processing continues. 

El. RICH 

Rich [11] performed some preliminary work on an error correction 
system for mixed strategy parsers based on a scheme suggested by 
Gfies [3]. It involves using legal triples to correct an error. A 
legal triple is an ordered, syntactically correct set of three terminal 
symbols for the grammar. The triples would be applied to the symbol 
prior to the error and the error symbol or the former and the symbol 
f 61 lowing the error for errors restricted to single symbols. In this 
manner a required deletion, replacement, insertion, or transposition 
would be defined. 

Rich anticipated that error correction attempts would have to be 
limited and that such a system would require provisions to facilitate 
recovery from an error correction that was found to be wrong. This 
would entail saving all parsing information at the point of the error, 
perhaps in the form of a temporary parse stack operating locally in 
parallel with the main stack. More important, provisions for a means 
of cancelling any code emitted during an aborted error correction could 
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be required. Rich suggested that if a correction could not be applied 
then a unit of code (e.g., <STATEMENT>) would be deleted and a pseudo 
statement (e.g., a diagnostic message) substituted. 
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III. PROPOSED SYSTEM 



The basic mechanics of the system were initially conceptualized as 
involving analysis of the input string following a syntax error. This 
analysis coupled with that which had preceded the error would provide a 
more cohesive context in which to analyze the error thus enhance error 
localization and definition and increase the probability of selecting 
the most applicable correction. Error analysis in this environment 
would be more definitive than schemes involving matching patterns of 
terminal symbol strings or extrapolating possible inputs from the 
analysis available prior to the error. 

A.' . LR(k) GRAMMARS 

The LR(k) grammars were selected as the class to which the system 
would apply as they and LR(k) parsing enjoy several advantages over 
simple and mixed strategy precedence (MSP) techniques: (1) the class of 
LR(k) grammars includes the precedence grammars, (2) the LR(k) parse 
stack provides an accessible and complete parse history to any point 
during processing of the object string. This deterministic context 
should permit more confident error analysis, and (3) all syntax errors 
are detected in read or lookahead states in the form of "ILLEGAL SYMBOL 
PAIRS," thus, the LR(k) parse stack is syntax error free. 

LR(k) parsers may be represented by a characteristic finite state 
machine (CFSM) [2] which consists of two essential active states — read 
and lookahead. The lookahead states are required to resolve stacking/ 
reduction decisions; that is, the next k symbols in the input stream 
define sufficient context to resolve the local conflict. Associated with 
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each state in the FSM is a unique accessing symbol. The accessing 
symbol is the terminal or nonterminal symbol from the grammar that has 
caused the recognizer to enter that state. In Figure 1, the nonterminal 
symbol <Block Body> is the result of a reduction made to a portion of 
the symbol stream already processed onto the parse stack and is the 
accessing symbol for read state 5. Read state 5 causes the next symbol 
in the code stream, s-j , to be read. If s^ is the symbol END then a 
transition is made to reduce state 8, if s^ is a semicolon then a 
transition is made to read state 36. These two symbols then become 
accessing symbols for their respective transition states. Similarly, 
the symbols BEGIN, END, ... WRITEON are accessing symbols for their 
respective states following read state 36. The terminal symbols that 
are state accessing symbols will play a significant role in the proposed 
system and will be discussed below. 

The entire LR(k) parse stack is accessible and defines the complete 
parse history. As LR(k) parsing is deterministic, each new state is a 
unique transition from its predecessor. This deterministic trace 
through the FSM as a symbol string is processed, continuously confirms 
syntactic continuity as each state is entered. Therefore, it is generally 
not necessary to access the entire stack to determine left context for 
a specific symbol. 

B. LR(k)/RL(k) GRAMMARS 

To achieve error isolation and definition of error context, the 
stipulation was made that the grammar on which the error corrector 
would operate must be both LR(k) and RL(k). Then the construction of 
a LR(k) parser for the reverse of the grammar would enable bi-directional 
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Figure 







analysis of an error in that both forward and backwards parsers should 
recognize a given error in a sentence from the language. 

It was fully appreciated that the above requirement was not 
insignificant. Knuth [7] discussed the LR(k)/RL(k) relation briefly by 
exampling a language for which a RL(k) grammar could be constructed but 
a LR(k) grammar could not, for any' k. The specific problem that he 
exampled was encountered in the reverse situation in the grammar used 
in the model. Given the two ALGOL-E sentential forms: 

FUNCTION < I D> ( < I D> , < I D> , . . . ,<ID>); 
and 

READ (<VAR> ,<VAR>, . . . ,<VAR>) ; 
where <VAR> may be derived from <ID>, an input sequence: 

READ (AAA,BBB ,CCC) ; 

is deterministic when read in a left to right manner because of the 
differentiating reserved word READ, but is not LR(k) when read right 
to left, because the parser cannot decide whether or not to reduce the 
identifier to <VAR> until the symbol READ has been recognized. This 
ambiguity was resolved in the model grammar used in this research by 
changing the read list delimiters from parentheses to vertical bars. 

(Two other similar changes to the grammar were required and will be 
described in Section IV.) 

The cost of sacrificing minor user-oriented features should not 
necessarily preclude a language from more efficient processing 
techniques. Involved here is the sacrifice of minor symbology so as 
to permit automatic error processing of the grammar. Minor modifications 
of this same nature to specific grammars may enable the proposed system 
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to apply to a significant set of interesting languages. As the model 
grammar is not trivial, a valid example is provided. 

C'. REVERSE PARSING 

Error definition and correction was approached from the point of 
view that they involved essentially the analysis of an error in its 
environment; and that the probability of not making a mistake during 
analysis was a function of the magnitude of the error environment 
considered. Thus, when an error is detected, it becomes- necessary to 
read the input code that follows the error sequence and relate this 
right context to that on the left. In this manner the error would be 
localized. Pattern matching techniques, such as LaFrance's, accomplish 
this condition by projecting ahead all productions applicable at the 
point of error detection. This extension defines a set of all possible 
correct symbol patterns that may syntactically follow the last accepted 
symbol. LaFrance extrapolates all legal triples, thus, is able to 
correct most single and double symbol errors and, in some cases, triple 
symbol errors, particularly those involving reordering of the generated 
triples. 

Except in the last case, at least one of the symbols in the generated 
triple was used to define the right context of the error. When one or 
more of the symbols in the triple were matched by symbols in the next 
four symbols from the input stream, corrections were based on the 
interpretation that the error extended from the symbol at which parsing 
halted to the start of the matching sequence. 

It would be possible to also define right context by scanning the 
input stream to the end and allowing the reverse parser to parse from 
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right to left. When the reverse parser stopped due to an error the right 
context would be defined to that point. This can immediately be seen to 
be a very impractical method. 

A means was needed to unambiguously engage reverse parsing at some 
intermediate symbol in the code stream beyond the error sequence. This 
would require the ability to uniquely define a parse state for that 
intermediate symbol. If a state could be so defined then, by the 
nature of LR(k) parsing, the parse history prior to that state could 
be inferred. Starting the reverse parser at an intermediate symbol 
would, in essence, simulate having parsed from the end of the code 
stream to that symbol . 

If the symbol immediately following the error sequence could be 
determined and if this symbol was a FSM state accessing symbol, that 
is, it defined a unique state in the FSM, then an immediate transition 
to that state could be made. Associated with each read and lookahead 
state is a defined set of terminal symbols any of which is syntactically 
correct with respect to the accessing symbol for that state. When the 
transition was made, the reverse parser would be in a position to 
immediately reference the last symbol in the error sequence. 

In this instance, only ordered pairs, vice triples, would be 
required for pattern matching as this is all that would be required to 
span the error sequence. The savings made by having to construct one 
less level of a generation tree are immediately apparent. 

However, an immediate extension was suggested. If the symbol 
immediately following the error will define a unique reverse parse 
state then it may be possible to select any terminal symbol that so 
defines a state, find this symbol in the input stream, transfer to the 
appropriate state, and parse back to the error. 
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D. KEYS 



The determination of symbols or keys that uniquely defined a 
reverse parser state was predicated on several requirements. Certain 
required attributes of a key were easily defined: the key should not be 
part of the error and it must appear in the code stream. 

To ascertain that the keys were located outside of the error sequence 
required restricting the maximum length of the error sequence to n 
symbols. Then scanning for the keys would commence n symbols after the 
point of error detection. 

The stipulation that the key must appear required that at least two 
symbols be designated keys. The first would be some symbol from the set 
of terminal symbols for the grammar and, to provide for the case where 
this symbol is not present in the balance of the input stream, the 
second symbol would be that used by the grammar to signify end-of-file. 

Also, while keys could be located well beyond the error sequence, 
they should be located close enough so as to minimize the probability 
of encountering a second error while parsing back to the first. 

A key that would specify a state in which reverse parsing could 
commence was only sufficient for reverse parsing. To provide for the 
case that the error was not correctable, it was also necessary that 
this key specify a state to which the forward parser could be transferred 
and restarted. 

If- the grammar is structured, as is the model grammar, then keys 
may be suggested by the delineators of the basic recursive forms. The 
basic form of an AL60L-E sentence was quickly discerned as the terminal 
symbol BEGIN followed by any number of declaration Set>'s, each 
delineated by a period, followed by at least one <Statement>, with 
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semicolons separating multiple <Statement>' s, followed by the terminal 
symbol END. The period, semicolon, and END were considered as possible 
keys. 

A grammar analyzer with which to define keys was not designed; 
however, a semi -mechanical analysis process was defined and applied to 
the model grammar. 

After excluding the symbols <Identifier>, <Number>, and <String> 
from the model grammar, the intersection of terminal accessing symbols 
defining read states in the two parsers was found. The set contained 
only the set of arithmetic operators, "OR", "AND", "(", and 

thereby eliminating END from the tentative list. Applying 
intuitive arguments, the set was further reduced. 

All of the symbols, less the period and semicolon, were dis- 
qualified because they need not appear regularly in a code stream and 
they defined illogical potential deletion units. Long strings of code 
between the error sequence and the key would increase the probability 
of encountering a second error thereby causing the error correction 
attempt to be aborted and all code to the key to be deleted. An 
illogical deletion unit would be exampled by using the reserved word 
AND as a key and the attempt at error correction failed. Though it may 
be possible to delete code between two AND's and preserve syntactic 
continuity in the remaining code, intuitively, that deletion would 
violate the basic structure of the language. 

The period and semicolon appeared to have both desired attributes. 
Judging from the language, both occur fairly regularly and more important, 
the strings of the code between either and an error are of manageable 
lengths. 
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Additionally, the left parenthesis and the add and subtract signs 
defined multiple states in which the reverse parser could be started. 

It would be a simple matter to scan for (say) a left parenthesis, but 
it would not be readily apparent in which state the reverse parser 
should be started to process code back to the error. 

Though a simplistic approach, this general analysis of the grammar 
suggested several variants and extensions to the definition, employment, 
and effects of keys. 

For example, the left parenthesis was found to be an accessing 
symbol for six read states in the reverse parser, three of which were 
independently unique. (The grammar analyzer employed to construct the 
parser was not designed to remove redundant states in the FSM, which 
it is possible to do.) As one of the prime objectives was to remain 
close to the error so as to avoid second errors as much as possible, it 
was seen that it could be significantly beneficial with respect to error 
correction capability to assign a symbol such as the left parenthesis 
as a key. The parenthesis is an often used symbol and its being 
designated a key would enable, in many cases, scanning and processing 
shorter strings of code. Resolution of the ambiguity created by the 
multiple states defined by the key could be accomplished by providing 
for variable path parsing via a system such as Irons'. That is, start 
the reverse parser in each state defined by the key and £llow it to 
return to the error. It may be the case that an increased selection 
of possible error corrections may evolve, thereby enhancing the system's 
overall ability. 

Secondly, only those symbols defining read states were considered 
for the model; however, it could be of benefit to not restrict key 
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selection to only that case. Through grammar analysis it may be 
possible and practical to define more valuable keys by considering 
those terminal symbols that define lookahead and reduce states in 
addition to read states. 

In fact, a natural extension of the preceding discussion might be 
to consider only the symbol immediately following the maximum error 

sequence and allow variable path parsing -'back to the origin of the 

error. However, in the event that error correction failed, the 
problems associated with error recovery would remain to be resolved. 

It would be highly probable that the sequence of code between the error 

point and the key would not be a convenient string to delete. One 

possible solution would be to delete code to the first available key 
that did define a logical deletion unit. 

Consideration of the above possibilities was doubly motivated. 

First by the objective to keep keys as close to the error as practical, 
and second, it was surprising to find a set of fifty terminal symbols 
so severely reduced when the subset of those symbols defining read 
states in both parsers was determined. It seems very likely that there 
may be interesting LR(k) grammars that would be excluded from the pro- 
posed system by restricting the definition of keys to those symbols 
that mutually defined only read states between the two parsers. 

E. PROCEDURE 

When an error is detected in either a read or lookahead state, 
the corrector procedure requires stepping over n symbols to insure that 
the key selected is not imbedded in the error string, scanning forward 
until a key is encountered, and engaging the reverse parser in the 
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state prescribed. The reverse parser is allowed to parse backwards 
until it either stops at the same point at which the forward parser 
stopped or is stopped due to encountering an error. If the length of 
the symbol string between the two parsers is greater than n then the 
restriction on error magnitude has been violated, code will be deleted 
to the key and the forward parser will be restarted at the symbol 
following the key. If the number of symbols between the two parsers 
is:equal to k, 1<= k<=n, then symbol strings of length k are generated 
from the context of either parser and, via a set of pattern matching 
rules such as those defined by LaFrance, the generated strings are 
compared with the error string and either symbol deletion, insertion, 
replacement, and/or transposition will be defined. If k is equal to 
zero then the reverse parser has returned to the symbol recognized as 
an error by the forward parser. The error may be quickly resolved by 
intersecting the symbol sets associated with the two parse states 
thereby defining a replacement symbol. Or deletion may be defined by 
determining that both parsers would be satisfied by the symbols that 
follow the error relative to either parser. 

Ini the case that the reverse parser is not in an error condition 
while reading the forward parser error symbol, an insertion symbol may 
be defined by intersecting the parse state symbol sets after stepping 
the .reverse parser to its next read or lookahead state. 

In the event that all deterministic error correction attempts fail, 
it may be advantageous to heuristically select a symbol from the forward 
parser symbol set to either replace or be inserted in front of the error 
symbol and restart forward parsing rather than automatically proceed 
with code deletion. 
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At the cost of the extra processing time required, a heuristic 
attempt to correct an error would serve two purposes. It may provide 
the necessary impetus to complete the correction or, even if the attempt 
failed, it should define to the programmer an approach to correction 
through the associated diagnostics. 

Consider the case where the allowable error magnitude is one symbol 
and the error is actually the omission of two symbols. For example: 

X : : =: -Y + IF A THEN..., 

where the symbols" "Z;" have been omitted. The forward parser will 
detect an error when it attempts to access the symbol IF and the 
reverse parser will detect an error accessing the plus sign. Neither 
parser may be satisfied by any deletion of adjacent errors, nor by the 
transposition of any symbol pairs. Also, the intersection of the symbol 
sets associated with each parser state will be empty, thus an insertion 
or replacement symbol will not be deterministically defined. A 
heuristic attempt to correct may be made at this point by selecting 
a symbol from the forward parser symbol set for insertion in front of 
the error symbol (the error symbol is the word IF for the forward 
parser. ) 

Obviously, by inspection, a choice is available. The selection 
would certainly include a number, another identifier, and a left 
parenthesis. Two of these three symbols would effectively reduce the 
remaining error to a single symbol and permit the deterministic processes 
to re-analyze the error. 

If the left parenthesis was selected then the gains are not so 
obvious. On the next analysis iteration it is probable the deterministic 
attempts would again fail. Heuristically , however, another symbol would 
be inserted. 
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How symbols are selected from applicable sets is also variable. 
Whether they are selected as they are ordered in the set or in reverse 
order may be problematic. However, a means to avoid issuing a duplicate 
of the previous choice would probably be required. 

In the manner described and within the confines of error restric- 
tions, the proposed error corrector accomplishes error detection as 
early as possible and defines error processing such that the error is 
not promulgated to the stack. A strong deterministic attempt will be 
made to correct an error and failing that, a heuristic choice of 
correction will be applied. 

Two other facilities would be required to support the proposed 
system: (1) an upper limit to the number of heuristically selected 
corrections that would be made for any one error must be specified. 

Only when this limit was reached would code be deleted, and (2) 
complete communications are maintained with the programmer to insure 
that, in the event error correction failed, the diagnostics would 
provide a complete history of corrector action helping to isolate, 
and perhaps allowing the user to quickly discern the true cause of 
the error. 

The case that the key symbol had been missplaced and in itself 
constituted an error required consideration. No problem would arise 
if a key was located in the allowable error string as this string would 
not be considered when scanning for keys. If the key was erroneously 
placed beyond the error string then the error restrictions would be 
violated; however, the violation would not be detected until the code 
sequence between the key and the following key was processed. The 
corrector would not recognize an erroneous key in itself; hence, 
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correction procedures would be applied to both strings of code, that 
preceding the error key and that immediately following. 

The possibility of defining symbol strings vice single symbols as 
keys to alleviate the problem of keys being in error was considered. 
Again these considerations were also motivated by the desire to place 
the keys as close to the error as possible to preclude encountering 
second errors . 

1 t . may be possible to define ordered sets of terminal symbols such 
that. their being located in the input stream would specify a unique 
start state for the reverse parser whose accessing symbol would be one 
of-' the elements of the set. For example, if the string <0perator> ( 
<Tdentifier> uniquely defined a reverse parser state such that the 
accessing symbol was a left parenthesis, then the location of this 
string following an error may preclude the requirement to scan further 
for a semicolon or period. Thus, the possibility of encountering a 
second error while reverse parsing would be reduced. 

The above concept of keying on symbol strings may be extendable to 
enable the forward parser to perceive or extrapolate symbol sets based 
on the state it was in when the error was recognized and the left 
context, that, if located in the code stream following the error, would 
define unique start states for the reverse parser. It may be possible 
to define a set or hierarchy of such strings through a complex analysis 
of the forward and reverse parser interface. Continuing the example 
above., for a given forward parser state there may be several contexts 
in which a left parenthesis may be taken such that each uniquely 
defines a reverse parser start state. 
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Not locating such strings following an error would not necessarily 
constitute a second error and would require that hierarchical sets such 
as these also include any "primary" keys defined for the grammar, such 
as the period and semicolon previously discussed. If the forward 
parser was currently parsing an <If Statements for example, and 
locating the reserved word THEN would enable engaging the reverse parser; 
not locating that key should not automatically constitute a second error. 
That particular key may be involved directly in the detected error 
sequence and scanning should continue, searching for the next defineable 
key in the key set for <If Statements 
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IV. IMPLEMENTATION 



For the purpose of implementation of the error recovery system 
defined, considerations were restricted to those syntax errors involving 
only single symbols and transposition of symbol pairs. Extensions of 
the system to include errors of greater complexity and scope will be 
discussed at the conclusion. 

A. COMPILER 

A basic model of the proposed error correction system was implemented 
in an XPL compiler for ALGOL-E, a non-trivial ALGOL-like language (134 
productions, 50 terminal symbols, 74 non- terminal symbols). A listing 
of the grammar is provided in the Appendix. The model is semmantics 
independent, its parameters being solely derived from the forward and 
reverse parsers, i.e. , parse states and associated symbol sets. 

The compiler was constructed from an existing ALGOL-E compiler 
employing MSP parsing [6] and an XPL skeleton compiler written by 
DeRemer [1] for his SLR(k) parser. Figure 2 shows some of the detail 
in the construction of the hybrid model compiler. Studies have shown 
that the SLR(k) parser constructor and the resulting parser to require 
significantly less space and time than the MSP parsers [2,4]. This was 
also found to be the case in this application. The SLR(k) parser for 
ALGOL-E required approximately 64 percent of the space required for the 
MSP parser for the same grammar. This was considered significant as 
the error correction technique to be implemented would require both a 
forward and a reverse parser. 
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The SLR(k) parser constructor was defined and implemented by DeRemer. 
The gained efficiency of his system over other basic LR(k) parser con- 
structors was achieved by constructing a LR(0) parser for the grammar 
then adding lookahead states only where they were needed. This approach 
resulted in faster construction and reduced parser size. 

B. GRAMMAR 

The ALGOL-E grammar [6] was found to be not SLR(l), as was also the 
case for the reverse grammar.. The required changes to the grammar were 
essentially minor and did not detract from or enhance the language. It 
was necessary to change the delimiters in a read statement from paren- 
theses to vertical bars and the ambiguity of the ALGOL assignment symbol, 
:=, was resolved by defining a new terminal symbol :<Setq> , <Setq> is 
transparent to the programmer as are <Identifier> , <Number>, and 
<String> and is similarly assigned in procedure SCAN of the compiler. 
Additionally, procedure calls were differentiated from function calls 
by requiring the reserved word CALL to precede the name of the subroutine. 
It was also necessary to delimit declaration Set> with periods vice 
semicolons . 

C. SPELLING CORRECTIONS 

Emperically, misspelled identifiers and reserved words form a signif- 
icant percentage of errors; therefore, after appropriate modification, 
a spelling checking system was incorporated into the compiler [11]. An 
attempted error correction would fail if the reverse parser failed to 
return to the point of the input stream at which the forward parser was 
halted, hence, it was necessary to also enable spelling correction of 
misspelled reversed words in the reverse parser. Only reserved words 
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are pertinent to the reverse parser spelling checking procedure as it 
is concerned with only the syntax of identifiers, not the semantics, 
i.e.', spelling. The spelling checking procedures incorporated were 
simplistic but demonstrative; only those errors involving one deleted 
or added character, one character in error, or two adjacent characters 
transposed were correctable. However, the complexity and sophistication 
arejeasily extended if one is willing to absorb the additional cost in 
terms of space and time. 

D.:. PROCEDURE 

The model consists of two primary procedures, ERROR_ANALYZER and 
REVERSE_PARSER (Reference Figure 3). CAN_D0_WITH0UT_T0P , FP_INSTRSCT_RP , 
and CHECK_CONTEXT_OF_TOP_AND_TOKEN are called from ERROR_ANALYZER to 
determine if a symbol is a member of an applicable symbol set or to 
determine the symbol in the intersection of the applicable symbol sets 
of the forward and reverse parsers respectively. The applicable symbol 
sets are those read and/or lookahead symbol sets for a particular 
forward or reverse parse state. Procedures TRANSPOSE, REPLACE, DELETE, 
and .INSERT are called when a tentative error solution has been deter- 
mined and the action implied by the procedure names is to be applied to 
the symbol at the top of the stack and/or the token symbol (next symbol 
to_ be read) • 

As in the case of spelling correction, the scope of errors was 
restricted to single symbol insertion or deletion, one symbol in error, 
or two adjacent symbols transposed. 

Error analysis was restricted to only that symbol on the top of the 
stack and/or the token symbol. This restriction was imposed to 
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Figure 3 
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preclude having to delete code that may have been emitted with the 
possible reduction of the second symbol in the stack prior to detecting 
the error. Further, the heuristic choice was made to first test for 
the possibility of deleting the error symbol. This was to reduce the 
occurrences of having to define a <Number>, <Identifer>, or <String> 
should the case be that the error was caused by any one of those omis- 
sions. For example, if X:=Y++Z; was the input string then one of the 
operators would be deleted vice inserting either <Number> or <Identifier> 
or - any other expression. 

For purposes of implementation, the period and semicolon were 
defined as the primary keys for all cases. EOF was designated the terminal 
key.. The period was used as the primary key when the syntax analyzer 
was parsing declarations (reference ALGOL -E (Modified) grammar listing) 
and semicolon was the primary key elsewhere. 

When the forward parser is stopped by an error condition it is in 
either a read or a lookahead state and either the two top symbols on 
the stack or the. top symbol and the lookahead symbol will constitute 
an illegal symbol pair. At this point, the history of the finite state 
machine for the grammar is known or may be determined directly from the 
current parse state and the set of read or lookahead symbols associated 
with that state. That is, given a symbol from the current applicable 
SB.t,. either- the symbol will be stacked, indicating that the right part 
af some production is one symbol more complete, or the symbol just 
looked at will specify that the right part of a production has been 
completely read and a corresponding reduction will be made in the stack. 

The result of that reduction will in turn specify another symbol (a 
production left-part) toward completing the right part of some 
production entered further down in the stack. 
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If the error symbol cannot be corrected as a misspelling then the 
error analyzing mechanism is engaged. Symbols are read into a symbol 
stack while the input stream is scanned for a key. The reverse parser 
is initiated in the state specified for the key and, operating with its 
own state stack, processes the symbol stack in reverse until it is 
stopped by an error that it cannot resolve as a misspelling or it 
reaches the point in the code stream at which the forward parser 
stopped. For example, reference Figure 4. 

Figure 4(a) depicts the configuration of the forward parsing stacks 
when an error (e) has been detected. The symbol e represents an error 
sequence of length n or less. If NEXT_SYMBOL(SP) is <Identifier> and 
is determined to be a misspelled reserved word then the correction is 
made immediately and parsing resumes; otherwise, the point of progress 
of the parse stack is marked (SAVE_SP, Fig 4(b)). The input stream is 
read to the key and the reverse parser is started in the read state 
for that key (R_STATE_STACK(RP) ) . 

Figure 4(c) depicts the configuration of the stacks after the 
reverse parser has successfully parsed back to the error point and 
error analysis and correction begins. (Note: pointers SP and SAVE_SP 
have been interchanged for compiler execution considerations only.) 

When forward parsing resumes after error correction, symbols through 
the key are read from stack NEXT_SYMBOL. Only then does the parser 
return to reading the input stream. If the error cannot be resolved or 
the reverse parser is halted short of error e by additional errors then 
the code from the error to the key (NEXT_SYMBOL(SAVE_SP) ) is deleted. 

Figure 5 depicts various configurations the two parsers may be in 
when the reverse parser has stopped. In conditions 5(i) and 5(j) the 
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errors are defined to be too far apart and symbols e i to key are 
deleted and forward processing is re-initialized at the semicolon. 

The conditions depicted in Figures 5(a) through 5(h) fall within the 
scope-of-error restrictions imposed and error analysis may be performed. 
Note that in configurations 5(a), 5(b), 5(e), and 5(f) the reverse 
parser may or may not be in an error state, i .e. , symbol e^ may be 
syntactically correct as the left context of a-j .. 

For configurations 5(a) through 5(h), symbols a^. , e-j - , , and 

a^. are checked against the read or lookahead symbol sets for the forward 
and reverse parser states so as to make an appropriate deletion, inser- 
tion, or transposition. If the error cannot be so resolved then a 
symbol is heuristically selected from the applicable forward parser 
symbol set without reference to the reverse parser and inserted in front 
of the error symbol. This heuristic, approach may be applied four times 
before code will be deleted. Control is then returned to the forward 
parser. 

Example 1: Configuration 5(a) 

Both the forward and reverse parsers are in read states after read- 
ing symbol e-j . Let the forward parser be in state f^ and the reverse 
parser be in state r^ Let fssj, be the set of symbols associated 
with the forward parser read state f^ and similarly, rss^ represents 
the symbol set for r^ . 

If a^ is a member of fss^ and a^ is a member of rss^ then 
delete e-j and continue normal processing. 

If the reverse parser (RP) is not in an error condition then step 
RP to its next read or lookahead state (r^-. ). If' the intersection of 
fss^ and rss j <+ i is empty then replace e-j with the intersection of 
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fss^ and rss^ (if this intersection is also empty then replace e^ 
with a symbol from fss^ ) and continue processing; otherwise, insert the 
intersectionof fss^ and rss^-j in front of e-| and continue. (Note: 
That the reverse parser may not be in an error condition when it reads 
the symbol causing the error for the forward parser is very pertinent 
to the error analysis process. If it is the case that it is not in 
error then the initial assumption is that a symbol is missing in front 
of. the error symbol. With that assumption made, a symbol that is 
syntactically correct for both parsers is required for insertion in 
front of the error symbol. This is accomplished by stepping the reverse 
parser to its next read or lookahead state, which ever occurs first. 

The insertion symbol is then taken from the intersection of the symbol 
sets associated with the two parse states.) 

Otherwise, (RP is in an error condition), replace e-| with the inter- 
section of the FP and RP read state symbol sets if that intersection is 
not empty (if that intersection is empty then insert a symbol from fss k ) 
and continue processing. 

Example 2: Configuration 5(d) 

Both of the parsers are in error conditions, the forward parser (FP) 
is in read state f^ and RP is in lookahead state r^ . Again, let 
fss^ and rss^ be the symbol sets for the respective parse states. 

It 1S a member of fss^ and e^ is a member of rss^ then 
transpose e-j and an d continue processing. 

If. the intersection of the two symbol sets is not empty then 
replace e-j with a symbol from fss^ delete anc * continue. 

If e 2 is a member of fss^ then delete e-j and continue. 
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Otherwise (attempt the last resort), replace e-| with a symbol from 
fss^ and continue processing. 

E. RESULTS 

Figure 6 examples some of the results of the error correcting system 
described. Generally, the system recognized a broad class of single 
symbol errors of insertion and omission and double symbol transposition 
errors. However, there was one small, well-defined class of error that 
though recognized, could not be corrected while retaining the imposed 
restriction of not modifying the parse stack below the top symbol to 
achieve an error solution. 

For those constructs in which a statement was started with a 
reserved word followed by an identifier, the omission of the reserved 
word was not detected until the symbol following the identifier was read 
as it is syntactically correct for statements to also start with an 
identifier. In this instance, the true error point was two symbols from 
the top of the parse stack when the error condition was recognized. 

In the case where the reserved word was not omitted but merely 
grossly misspelled such that the symbol was interpretted as an 
identifier, the error condition arose when the following identifier was 
read. In this instance the true error point was one down from the top 
of the stack. 

For both situations, the omission and misspelling of the reserved 
word, by the time the error was discovered, the identifier following 
the error had already been reduced and associated code emitted. 

For the class of error conditions that was processed correctly, 
most conditions were corrected in a logical manner; logical in the sense 



40 











UJ 




















z 












o 








H— * 




















- J 












z 


























H 


z 












HM — 








o 


















LJJ 














00 






z 


£ 


















HH 


>- 












a: 






3 


<f 




















CL 












<< 






z 


CL 


















o 


< 












cl: 








z 


















L 


>- 


















< 


CO 


















o 




















o 


O 












X 








UJ 




















o 












w- 








< 


















Q 


3 












3 






z 


CL 




31 


C\J 












< 


LU 












3_ 








cL 




IU 


LU 












z 






Z 


Z 








• 




UL 


z 




HM 


M 








4- 




CD 


> 


• 


-J 


3 






1 


vO 




LU 


c 














• 




z 


CL 




z 


Z • 






1 


o 


• 




< 


h- 


o 


Oh 








in 


LU 


z 


r 


c 




o 




4- 


1 


• • * 


HH 


UJ 




CL 


z 


- CL 








CO 


3 . 


UJ 


o 


1- 


3 


•• h- 




3 




CO 


CL' 


IS 


CL 


X 


1 — 1 


r x 


CO 


Z 




t « 


< 


h- 


CD 


o 


H 


i — i 




HH 


z. 


m 


X 


LU 


3: . 


LL 


z 


>-LL 


LU 


3 






CD 


CO 






3 


CO 


z 




o 


ii 






LU 


*» 


z 


X 


KH 


Z 






r» • 


z 


> 


O 




z 


3 


a 


• — t 


UJ 


□ A 


z 


CL 


X 


> 


3 in 


O 






51: 


>in 


i — » 


LU 


CD 


CO 


i — i 


zz 


z 


h- 


)-H — 


< 


o 


CO 


LU 




I — + 


OHM 


o 




H- 


30 


LU 


UJ 


— 


z 


z 


h- 


X 


<• 




CO 


CO 


CL 




a 


30 


~ h- 


CD 




x 


o 


z 




CL 




= X 


►— f 


3 


o~ 


cd 


•*v 




a 


z 




CD 


s 3 


£ 




o 


< 


o 


LU 


o 


cd 


O LU 


a 




i- 


3- 


x<r 


z 


_J 


CD 


z 


z 


o 


CD 




ld 


CL | — 


i — i 


—I 




►— 4 


< Cl 


Z3 


—?■ 


CL' 




} | 


i— 


UJ 


z 


CD 


CJZ 


hmcL 


• — » 






<3 


3 


CL 


o 


< 


<ro 


h3 


h- 


X 


• . 


a 


LU 


CO 


HH 


3 


30 


30 


3 




CM 


o 


CO 


CO 


h- 


CL 


CL 


33 


3 


O 


r- 


<>- 


z 


HM 


o 


LU 


LU 


IDO 


LU 




Ch 


o< 


i — i 


z 


z 


CL 


CL 


oo 


Q 


CD~ 


3 


O CL 






3 






CL 








3< 


X 


-Jr 


3 


*ifr 


X 


*- CL 


* 








X 


X 


— 




X — 


tt — 






CM 


















ILL 




JOG 


Z 


z 


3 


Z 


zco 


ZCO 


Z 




z. 


>- 


o 


CD 




CD 


o 3 


03 


o 


1 


3 


co 


1 — 1 


i — < 




HM 




i — i 


1 — 1 




O' 




h- 


h- 




K 


1—* 


h- 


h- 


3_ 






cd 


cd 


— 


CD 


CD- 


o — 


CD 




CO 


JOO 


LU 


LU 


i — 1 


UJ 


331 


300 


3 


o,. 


t-M — ■ 


00 


3 


CL 




CL 


CL 


CL 


CL 








CL 


CL 


— 


CL 


3— 


CL— 


CL 


os 


> 




O 


O 




CD 


O 


O 


O 




<• 


0 3 CM 


cd 


CD 


fO 


CD 


CD4* 


CD LA 


CD 


3_ 


Q 


CL 


















O 


<t 


X 


•fr 




-Jc 




V 


X 


< 


h- 


CD 


X 








X 




X 





4* 












X 
















X 




3 
















Z 


O 










o 




H- 1 
















3 


3 










3 






Z 










z 




Z 


►— » 










HM 




o 


3 










3 








sO 












• 


Z 










z 




K 


a 


3 








o 


4* 


O 




z 












CL 


s 


M 


h- 






Q 


UJ 


h- 


3 


3 








3 


z 


X 


H 




3 






h- 


M 


a 


i — i 


z 


z 






CD 


3 


3 


CL 


o 


M 






3 




s 






3 






h- 


z 




Z 


z 








3 


o 


> 




3 


Z 






O 




3 


> 


h- 


o 






— % 


•* 




CO 


M 








CO 3 Z 


+ 


Q 




3 








C CD < 


J 


UJ 


a 


3: 


o 






I2< S 




CD 


3 




> 






CO — ' 


o 


<T • 


CD 




< 






33 


z 


1 «- r> * 


< 


a 


3 






0< 3 


< 


CL Z 


3 


z 


co< 




CO 


30 h- 




LU • 


3 


< 


r x 






3 *-m 


z 


CL CO 


3 




3 




3 


3 II 3 


CL 


X 


CL 




03 




Z 


3: 


X 


n _j 




CD 


z< 




HM 


003 


o 


i—o 


s 


Z 


< 




3 


OO M 


CD 


03 


h- 


i — i 


•Ji 






03 M 


z 


CL 


M 


h- 






Z 


HM <f 




1 — ** 


CL 


1— 


O C\J 




o 


>> 31 


z 


X — \ 


2 


►— i 


< 






3 


3 


>-H 1 


z 


O 


3 •• 




z 


33 3 


3 


3< 




a 


3 




A 


Ol< hm 


S 


r 


o 


z 


r 3 




a 


HM J— 


1 — 


CO 


CL 




M 


CO 


H 


h-h- .^z 


3 


CL- 


O 


z 


Z 3 




3 


C0>— 1-^0 


CO 


3h 


3c 


3 


LU cL 


UJ 


cO 


<ZO 




M <C 




3 


3< 


Z 


V 


Jm>H 


z 


3 X 


UJ 


3 


3X 


HH 


n 


< 


A 


w|- 


> 


H- 


1 — C_D 


3 




► 3 CL 


a 


1- 1 


CL 


3 


3 




> 


• 3 CO LU 


i— 


zx 


3 


lO 


CO 


z< 


co 


33 h~ 


3 


3H 


CO 






o - 




3 CL *-00 


O0 


O I 


3 


r 


£ • ^ 


s 






V 


Ml 


CL 


• 


— o 


Z II 


• • 


u M-. u i — i 


s 


1— 


• r> 


z 


z > 


• »*<t 


r 


Ml/)0 




a: 


o •- 




< 


r x 




II 


CD 


LU v— 


LU 


CD 


CD 3 


3 


CD 


ZUJ< •• 


z 


3 


30 


Z 


Zco 


CD 1 


Z 


333 


H-H 


3 1 — 


3> 


» — i 


i — i 


Z< 


i — i 


O >-> CO < 


h~ 


3 w 


3< 


h- 


h- II 


> — < j: 


CD 


nh-: x 


CL 


CL CL 


3 CL 


3 


3 •• 


h-w 


< 


— Cl 


3 


LOS 


COLO 


3 


3 


LU 


3 


OO 3 


00 


CO 


CO 


CO 


C0< 


— 1 3 


3 


UJLU3< 


z 


M 


M-.Q 


Z 


zx 


31- 


3 


oIoU- 


K-* 


z 


x< 


H-H 


m3 


Ohm 


3 


<<hm3 






3 




3 


3 




3330 






■ii- 3 


X 


X < 


^ IS 


-5L 


CD CD O 3 


* 


•Jl* — 


tt — 


X 


■>r — 


mmmm 




33 
















OO 


z 


Zc\J 


ZCO 


z 


ZO 


Z •— i 


Z 


Z 31 M 


o 


o o 


OfO 


o 


oin 


OnO 


o 


o<r-r- 


M 


» — < 


v-H 


t-H 


i — i 


i — i 


HM 




h- 


h- 


h~ 


h- 


h- 


i — 


3 




CD 


CD — 


CD — 


CD 


CD— 


CD — 


CD 


3 


3 


3 rH 


UJH 


3 


3 3 


33 


3 


O ^3 


CL 


cL 


3 


3 


3 


3 


3 


3 


CL 


CL — 


3 — 


3 


3 — 


3— 


3 


3 


O 


•o 


O 


O 


O 


O 


O 


3 


CD 


CD 'O 


cDr- 


CD 


CD CO 


CD CT» 


CD 


03 
















3 t3 












* 


# 


•if 


X 


■Jr 


*- 


X 




•ii- 


X 


-i?- 



LO 



CD 

C- 

Z3 

CD 



41 



** CORRECTION ** INSERTING BETWEEN "READ" AND "WRITE" ON LINE 10 

12 I 1 I 95 IDELTA <5 := 4; 

** CORRECTION ** REPLACING "ZZ" BY "THEN" ON LINE 11 
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Figure 6 (Cont'd) 



that the corrections made were those that a human reader would be 
expected to make. A few configurations were made syntactically correct 
but not in the logical sense defined above. 

Example: FOR A := PETS 1 1 UNTIL... 

PETS, not recognized as a misspelling of STEP, was interpreted as 
an identifier resulting in the first "1" being replaced by STEP. 

For the case of self-embedding symbol pairs such as BEGIN... END and 
(...), the omission or duplication of the leading or left symbol 
resulted in the deletion or insertion of the right symbol at a later 
point in the input stream. At first brush, this particular correction 
may seem fairly gross but the delection/insertion points were syntactically 
defined without regard for what ever the programmer's intended logic may 
have been. 

For those errors that the system could not correct, the history of 
the attempts at solution prior to abandoning the error and deleting code 
and a definition of the last error encountered by the reverse parser 
were made available to the programmer, thereby fairly isolating the 
error and defining the inability to make a correction. 

The time involved in correcting errors averaged about 0.015 seconds 
per error for the programs tested. 
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V. CONCLUSIONS 



The syntax error correcting procedure proposed in this thesis is a 
viable system. While costs in terms of time and space are involved, 
its effects on a user's code are considerably more attractive than those 
of popular recovery systems employing automatic deletion of code to some 

1 stop symbol. Whereas the proposed system was defined to be grammar 
\ 

independent, the working model implemented was semi-automatic, using 
predefined start states for the reverse and forward parsers. It is 
recognized that these crossover points are significant with respect to 
fully automating the error correction process; however, they are the 
only points in the model that are language dependent. The correction 
procedures themselves are language independent; their only parameters 
are parse states and associated symbol sets defined by the parser 
constructor. 

The power in the procedure is attibutable to the LR(k) parsing 
employed. Errors are examined in a ver^ large context provided by the 
two disjoint state stacks of the forward and reverse parsers. Through 
LR(k) parsing, syntax errors are detected as the input stream is read 
and are precluded from the symbol stack. 

The model demonstrates that the proposed system detects and deter- 
ministically corrects a large class of errors thereby affording the 
programmer maximum exposure of his code to the analytical processes. 

A strong heuristic attempt to correct is provided for those cases that 
the error cannot be resolved deterministically. Should error correction 

v 

fail entirely, the system provides a good diagnosis and all residue of 
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the error is removed, thereby insuring against generating or cascading 
syntax errors through the remainder of the input stream. 
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VI. EXTENSIONS 



The error correction system described in this thesis indicates 
several areas where worthwhile extensions can be made and where further 
analysis is necessary. 

A. KEY DEFINITION 

As keys seem to lend themselves to empirical definition then it 
would seem logical that they may be analytically defined as the grammar 
to which they belong is being analyzed. An analyzer capable of defining 
a set of valid keys should also enable automating the error corrector 
by associating keys with states for both parsers and providing an auto- 
matic link to a key and the engagement of the error analyzing system 
from any state in the parser when a syntax error is detected. It may be 
feasible and practical to define a hierarchy of keys so that it would 
not be required to go beyond a minimum distance past the outer limit 
of the allowable error sequence. This would serve to minimize the 
likelihood of. encountering another error thereby causing the corrector 
to abort. 

It may also be of value to define a grammar analyzer capable of 
recognizing hierarchies of key symbols and symbol strings and associat- 
ing these sets with unique parser states such that, for a given senten- 
tial form, dedicated keys are available to minimize the key-to-error 
distance and increase the probability that a key itself does not 
constitute an error. 
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B. ERROR EXTENSION 



The current implementation severely restricts errors to single 
symbol’s except in the case of adjacent symbol transposition. A logical 
extension would be to extend the limits to provide for multiple symbol 
errors.. This would require either predefining and storing the legal 
symbol strings or' defining a symbol string generator to be called as 
requi red. 

C. . CLASSIC LR(k) VERSUS SLR(k) 

The classic LR(k) parser stops whenever it encounters an error 
symbol in either a read or lookahead state. The parser employed in the 
model defaults to the next read state in the event that the lookahead 
symbol is: not a member of the symbol set associated with a particular 
lookahead state. That is, a successful lookahead defines a stack 
reduction,, otherwise the decision is to stack (read) the lookahead 
symbol via the next logical read state. Only after the symbol is read 
is it determined that it is an error symbol or not. If would be 
advantageous to be able to stop the parser in a lookahead state rather 
than in the next read state so as to keep the symbol preceding the 
error readily accessible at the top of the stack and available to 
participate in error analysis. 

D. . STACK ACCESSIBILITY 

As inconvenient as it" may be, .there are constructs in the grammar 
such that their containing errors is undetectable until the point where 
correction is needed is in the stack. More analysis is needed to weigh 
the costs of incorporating a means of accessing the stack and, if 
necessary, deleting and regenerating code against the desire to and 
benefits of being able to correct this type error. 
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IF SHR(BYTE(LA_TABLE( SFS_# ) » BYTE_# ) » SHR_g ) THEN RETURN 
END; 

END; /* MEMBER_FP_LA */ 
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